Technologies for processor simulation modeling with machine learning

ABSTRACT

Technologies for processor architecture simulation with machine learning include a computing device that simulates performance of a processor executing training programs with a simulation model. The computing device captures ground truth performance statistics of the processor executing the training programs, for example using a cycle-accurate simulator. The computing device collects training simulation statistics from the simulation model and trains an error model with the training simulation statistics as feature vector and with the ground truth performance statistics. The computing device may simulate performance of the processor executing a test program, capture test simulation statistic from the simulation model, and predict a predicted error of the simulation model using the error model with the test simulation statistics as feature vector. The computing device may adjust output of the simulation model or adapt execution of the simulation model based on the predicted error. Other embodiments are described and claimed.

BACKGROUND

Processor architecture performance simulation is commonly used fordesign, validation, and/or testing of new and existing processorarchitectures. Typically, cycle-accurate simulation provides accuratesimulation results but requires long execution time. Application-scopesimulators improve simulation speed by abstracting, approximating, orotherwise modeling performance of the processor. By improving simulationspeed, an application-scope simulator may be capable of simulatingexecution of an entire application executing on multiple processor coresin a reasonable amount of time. Due to abstraction and/or approximation,application-scope simulators are typically not as accurate ascycle-accurate simulation.

BRIEF DESCRIPTION OF THE DRAWINGS

The concepts described herein are illustrated by way of example and notby way of limitation in the accompanying figures. For simplicity andclarity of illustration, elements illustrated in the figures are notnecessarily drawn to scale. Where considered appropriate, referencelabels have been repeated among the figures to indicate corresponding oranalogous elements.

FIG. 1 is a simplified block diagram of at least one embodiment of acomputing device for processor simulation modeling with machinelearning;

FIG. 2 is a simplified block diagram of at least one embodiment of anenvironment that may be established by the computing device of FIG. 1;

FIG. 3 is a simplified flow diagram of at least one embodiment of amethod for processor simulation modeling with machine learning that maybe executed by the computing device of FIGS. 1-2;

FIG. 4 is a simplified flow diagram of at least one embodiment of amethod for offline error model training that may be executed by thecomputing device of FIGS. 1-2;

FIG. 5 is a simplified flow diagram of at least one embodiment of amethod for online error model training that may be executed by thecomputing device of FIGS. 1-2;

FIG. 6 is a simplified flow diagram of at least one embodiment of amethod for offline simulation error correction that may be executed bythe computing device of FIGS. 1-2; and

FIG. 7 is a simplified flow diagram of at least one embodiment of amethod for hybrid/online simulation error correction that may beexecuted by the computing device of FIGS. 1-2.

DETAILED DESCRIPTION OF THE DRAWINGS

While the concepts of the present disclosure are susceptible to variousmodifications and alternative forms, specific embodiments thereof havebeen shown by way of example in the drawings and will be describedherein in detail. It should be understood, however, that there is nointent to limit the concepts of the present disclosure to the particularforms disclosed, but on the contrary, the intention is to cover allmodifications, equivalents, and alternatives consistent with the presentdisclosure and the appended claims.

References in the specification to “one embodiment,” “an embodiment,”“an illustrative embodiment,” etc., indicate that the embodimentdescribed may include a particular feature, structure, orcharacteristic, but every embodiment may or may not necessarily includethat particular feature, structure, or characteristic. Moreover, suchphrases are not necessarily referring to the same embodiment. Further,when a particular feature, structure, or characteristic is described inconnection with an embodiment, it is submitted that it is within theknowledge of one skilled in the art to effect such feature, structure,or characteristic in connection with other embodiments whether or notexplicitly described. Additionally, it should be appreciated that itemsincluded in a list in the form of “at least one of A, B, and C” can mean(A); (B); (C); (A and B); (A and C); (B and C); or (A, B, and C).Similarly, items listed in the form of “at least one of A, B, or C” canmean (A); (B); (C); (A and B); (A and C); (B and C); or (A, B, and C).

The disclosed embodiments may be implemented, in some cases, inhardware, firmware, software, or any combination thereof. The disclosedembodiments may also be implemented as instructions carried by or storedon one or more transitory or non-transitory machine-readable (e.g.,computer-readable) storage media, which may be read and executed by oneor more processors. A machine-readable storage medium may be embodied asany storage device, mechanism, or other physical structure for storingor transmitting information in a form readable by a machine (e.g., avolatile or non-volatile memory, a media disc, or other media device).

In the drawings, some structural or method features may be shown inspecific arrangements and/or orderings. However, it should beappreciated that such specific arrangements and/or orderings may not berequired. Rather, in some embodiments, such features may be arranged ina different manner and/or order than shown in the illustrative figures.Additionally, the inclusion of a structural or method feature in aparticular figure is not meant to imply that such feature is required inall embodiments and, in some embodiments, may not be included or may becombined with other features.

Referring now to FIG. 1, in an illustrative embodiment, a computingdevice 100 for processor simulation modeling with machine learning isshown. In use, as described further below, the computing device 100 usesan application-level simulation model to simulate execution of multipletraining programs by a simulated processor. The computing device 100also collects ground truth simulation results for the training programs,for example from a cycle-accurate simulator. The computing device 100trains an error model using performance statistics from the simulationmodel against the ground truth simulation results. The simulation modelis an application-level processor simulator, and the error model is amachine learning regression model. Thus, the error model essentiallylearns the error in simulation introduced by structures and/or othereffects that are not captured by the simulation model. After error modeltraining, the computing device 100 may use the simulation model tosimulate execution of a test program, predict an error of the simulationmodel using the trained error model, and adjust output of the simulationmodel based on the predicted error. Accordingly, the computing device100 may improve the accuracy of fast architecture-level simulationwithout adding to simulation speed. For example, a typicalapplication-level simulation model may have an accuracy loss of about20% compared to cycle-accurate simulation, while the computing device100 may provide an accuracy loss of less than 10% compared tocycle-accurate simulation, without a significant decrease in simulationspeed. As described below, error correction may be performed in anoffline mode (after simulation), or in an online/hybrid mode (duringsimulation). Error correction during simulation may improve simulationresults, particularly for applications that synchronize often betweenthreads or processes.

The computing device 100 may be embodied as any type of computation orcomputer device capable of performing the functions described herein,including, without limitation, a computer, a server, a workstation, adesktop computer, a laptop computer, a notebook computer, a tabletcomputer, a mobile computing device, a wearable computing device, anetwork appliance, a web appliance, a distributed computing system, aprocessor-based system, and/or a consumer electronic device. As shown inFIG. 1, the computing device 100 illustratively include a processor 120,an input/output subsystem 122, a memory 124, a data storage device 126,and a communication subsystem 128, and/or other components and devicescommonly found in a server computer or similar computing device. Ofcourse, the computing device 100 may include other or additionalcomponents, such as those commonly found in a server computer (e.g.,various input/output devices), in other embodiments. Additionally, insome embodiments, one or more of the illustrative components may beincorporated in, or otherwise form a portion of, another component. Forexample, the memory 124, or portions thereof, may be incorporated in theprocessor 120 in some embodiments.

The processor 120 may be embodied as any type of processor capable ofperforming the functions described herein. The processor 120 may beembodied as a single or multi-core processor(s), digital signalprocessor, microcontroller, or other processor or processing/controllingcircuit. Additionally or alternatively, in some embodiments theprocessor 120 may be embodied as multiple processers of multiplecomputing devices in a datacenter. Similarly, the memory 124 may beembodied as any type of volatile or non-volatile memory or data storagecapable of performing the functions described herein. In operation, thememory 124 may store various data and software used during operation ofthe computing device 100, such as operating systems, applications,programs, libraries, and drivers. The memory 124 is communicativelycoupled to the processor 120 via the I/O subsystem 122, which may beembodied as circuitry and/or components to facilitate input/outputoperations with the processor 120, the memory 124, and other componentsof the computing device 100. For example, the I/O subsystem 122 may beembodied as, or otherwise include, memory controller hubs, input/outputcontrol hubs, platform controller hubs, integrated control circuitry,firmware devices, communication links (i.e., point-to-point links, buslinks, wires, cables, light guides, printed circuit board traces, etc.)and/or other components and subsystems to facilitate the input/outputoperations. In some embodiments, the I/O subsystem 122 may form aportion of a system-on-a-chip (SoC) and be incorporated, along with theprocessor 120, the memory 124, and other components of the computingdevice 100, on a single integrated circuit chip.

The data storage device 126 may be embodied as any type of device ordevices configured for short-term or long-term storage of data such as,for example, memory devices and circuits, memory cards, hard diskdrives, solid-state drives, or other data storage devices. Thecommunication subsystem 128 of the computing device 100 may be embodiedas any communication circuit, device, or collection thereof, capable ofenabling communications between the computing device 100 and otherremote devices over a network. The communication subsystem 128 may beconfigured to use any one or more communication technology (e.g., wiredor wireless communications) and associated protocols (e.g., Ethernet,InfiniBand®, Bluetooth®, Wi-Fi®, WiMAX, etc.) to effect suchcommunication.

As shown, the computing device 100 may also include one or moreperipheral devices 130. The peripheral devices 130 may include anynumber of additional input/output devices, interface devices, and/orother peripheral devices. For example, in some embodiments, theperipheral devices 130 may include a display, touch screen, graphicscircuitry, keyboard, mouse, speaker system, microphone, networkinterface, and/or other input/output devices, interface devices, and/orperipheral devices.

Referring now to FIG. 2, in an illustrative embodiment, the computingdevice 100 establishes an environment 200 during operation. Theillustrative environment 200 includes a performance simulator 206, aground truth manager 210, an error model trainer 216, and an errorcorrector 224. The various components of the environment 200 may beembodied as hardware, firmware, software, or a combination thereof. Assuch, in some embodiments, one or more of the components of theenvironment 200 may be embodied as circuitry or collection of electricaldevices (e.g., performance simulator circuitry 206, ground truth managercircuitry 210, error model trainer circuitry 216, and/or error correctorcircuitry 224). It should be appreciated that, in such embodiments, oneor more of the performance simulator circuitry 206, the ground truthmanager circuitry 210, the error model trainer circuitry 216, and/or theerror corrector circuitry 224 may form a portion of one or more of theprocessor 120, the I/O subsystem 122, the communication subsystem 128,and/or other components of the computing device 100. Additionally, insome embodiments, one or more of the illustrative components may form aportion of another component and/or one or more of the illustrativecomponents may be independent of one another.

The performance simulator 206 is configured to simulate performance of aprocessor with a simulation model 208 to determine a performancestatistic. The performance simulator 206 simulates the performance of aprocessor architecture during execution of an application, such as oneor more training programs 202 or a test program 204. The simulationmodel 208 may be embodied as an application-level processor architectureperformance simulator for a particular simulated processor architecture.The performance statistic may be embodied as, for example, a cycles perinstruction value, a floating point operations per second value, a powerconsumption value, a memory bandwidth value, or other performancestatistic generated by the simulation model 208. The programs 202, 204may be embodied as any executable code, object code, assembly code, orother computer program capable of being executed by the simulatedprocessor architecture. In particular, the programs 202, 204 may beembodied as complete, multi-threaded or multi-process applications thatmay be executed by multiple processor cores. In some embodiments, theperformance simulator 206 may be further configured to store simulationstatistics and performance statistics in response to completion of thesimulation. In some embodiments, the performance simulator 206 may beconfigured to simulate performance of the processor for a time intervalof an application (e.g., one of the programs 202, 204) with thesimulation model 208 to determine a performance statistic for the timeinterval.

The ground truth manager 210 is configured to collect a ground truthperformance statistic of the simulated processor during execution of anapplication (e.g., the training programs 202). In some embodiments, theground truth performance statistic may be collected by executing acycle-accurate simulation of the training program 202 using acycle-accurate simulator 212. In some embodiments, the ground truthperformance statistic may be collected by reading a pre-stored orotherwise predetermined database 214 of cycle-accurate simulationresults. In some embodiments, the ground truth performance statistic maybe collected by reading a performance counter of a hardware processor120.

The error model trainer 216 is configured to capture training simulationstatistics from the simulation model 208 for the training programs 202and to train an error model 222 with the training simulation statisticsand the ground truth performance statistic. The error model 222 may beembodied as a regression model to model an error of the performancestatistic generated by the simulation model 208 as compared to theground truth performance statistic. The training simulation statisticsare used as a feature vector for the error model 222. The trainingperformance statistics may be embodied as any simulated processor eventsgenerated by the simulation model 208. In some embodiments, the errormodel trainer 216 may be configured to capture the training simulationstatistics and train the error model 222 after completion of thesimulation of the performance of the processor. In some embodiments, theerror model trainer 216 may be configured to capture the trainingsimulation statistics from the simulation model 208 during simulationfor a predetermined simulation time interval. In some embodiments, thosefunctions may be performed by one or more sub-components, such as anoffline trainer 218 and/or an online trainer 220.

The error model 222 may be embodied as a machine learning regressionmodel, such as a linear regression model (e.g., a Lasso or supportvector regression (SVR) regression model) or an artificial neuralnetwork (e.g., a multi-layer perceptron, recurrent neural network, orother network). For example, an artificial neural network may be usedfor simulating existing hardware, because large amounts of ground truthdata may be collected inexpensively from hardware devices, in turnallowing for large amounts of training data. As another example, asimpler general linear-regression model may be used for simulatinghypothetical or future hardware, because collecting ground truth datamay require expensive cycle-accurate simulation.

The error corrector 224 is configured to capture test simulationstatistics from the simulation model 208 for the test program 204 inresponse to simulating of the performance of the processor. The errorcorrector 224 is further configured to predict an error of thesimulation model 208 using the error model 222 with the test simulationstatistics as a feature vector and to adjust a test performancestatistic for the test program 204 based on the predicted error. In someembodiments, the error corrector 224 may be configured to capture thetest simulation statistics and predict the error in response tocompleting the simulation of the performance of the processor. In someembodiments, the error corrector 224 may be configured to capture thetest simulation statistics from the simulation model 208 and predict theerror during simulation for a predetermined simulation time interval ofthe test program 204 in response to simulation of the performance of theprocessor, and to adapt the simulation model 208 based on the predictederror. In some embodiments, those functions may be performed by one ormore sub-components, such as an offline corrector 226, a hybridcorrector 228, and/or an online corrector 230.

Referring now to FIG. 3, in use, the computing device 100 may execute amethod 300 for processor simulation modeling. It should be appreciatedthat, in some embodiments, the operations of the method 300 may beperformed by one or more components of the environment 200 of thecomputing device 100 as shown in FIG. 2. The method 300 begins in block302, in which the computing device 100 trains the error model 222. Inblock 304, the computing device 100 simulates performance of a processorarchitecture using the simulation model 208. The computing device 100may use the simulation model 208 to simulate one or more of the trainingprograms 202. The simulation model 208 may generate an execution traceor other performance statistics as output based on the training programs202. For example, cycles per instruction (CPI), power consumption,floating point operations per second (FLOPS), memory bandwidth, or otherperformance statistics of the simulated processor may be generated. Insome embodiments, in block 306 the computing device 100 may use anapplication-level processor architecture performance simulator. Thesimulation model 208 may mechanistically or functionally deduce theperformance effects of a processor architecture during execution of amulti-core application. The application-level processor architectureperformance simulator may approximate or otherwise abstract theoperation of various components of the simulated processor in order toreduce simulation time. For example, the simulator may include componentmodels for one or more caches, memory management units, translationlookaside buffers, floating point units, re-order buffers, instructiondecoders, mesh network, or other components of the simulated processor.

In block 308, the computing device 100 captures simulation statisticsfrom the simulation model 208 to use as a feature vector for the errormodel 222. The simulation statistics may include any simulated processorevent or other statistics generated by the simulation model 208 and/orits various subcomponents. As described further below, the featurevector will be used as input to the error model 222. Any such simulationstatistics may be used as input features; however, in some embodimentslinearly dependent or derived features may be removed to improvetraining behavior of the error model 222. In some embodiments, the inputfeatures may include time-independent activity factors. The simulatorstatistics may be pre-processed prior to model training. In someembodiments, in block 310, the computing device 100 may normalizeaggregated measurements by execution time. For example, the computingdevice 100 may normalize event counters (such as L1 data cache misses)by execution time. In some embodiments, in block 312 the computingdevice 100 may normalize the input features to have a standard normaldistribution.

In block 314, the computing device 100 collects ground truth performancestatistics for the training programs 202. The ground truth performancestatistics represent the performance statistic that will be used tomodel simulation error of the simulation model 208. For example, theground truth data may be embodied as CPI, power consumption, FLOPS,memory bandwidth, or other performance statistics corresponding to theperformance statistics generated by the simulation model 208. Asdescribed further below, the ground truth statistics may be generated bythe cycle-accurate simulator 212, by actual hardware, or by any otheraccurate source. To simplify model training, the computing device 100may collect a single performance statistic, illustratively cycles perinstruction (CPI). Multiple performance statistics may be used with amulti-target learner variant.

In block 316, the computing device 100 trains the error model 222 usingthe feature vector (which is based on the simulation statistics from thesimulation model 208) and the ground truth performance statistics. Thecomputing device 100 trains the error model 222 to predict the errorgenerated by the simulation model 208 as compared to the ground truthwhen given the simulation statistics as input. The computing device 100may use any appropriate machine learning algorithm to train the errormodel 222, such as stochastic gradient descent (SGD).

Error model training as illustrated in block 302 may be performed in anoffline mode or an online mode. Offline model training is performedafter completion of one or more simulation runs by the simulation model208. One potential embodiment of a method for offline model training isdescribed below in connection with FIG. 4. Online model training isperformed at certain simulation intervals during a simulation run. Onepotential embodiment of a method for online model training is describedbelow in connection with FIG. 5.

After training the error model 222, in block 318 the computing device100 corrects simulated performance using the error model 222. In block320, the computing device 100 simulates performance of the processorarchitecture during execution of the test program 204 using thesimulation model 208. As described above, the simulation model 208 maygenerate an execution trace or other performance statistics as outputbased on the test program 204, including illustratively the CPI forexecution of the test program 204. In block 322, the computing device100 captures simulation statistics from the simulation model 208 to useas a feature vector for the error model 222. The computing device 100may capture the same types and/or categories of simulation statisticsand perform the same normalization used for model training as describedabove in connection with block 308. In block 324, the computing device100 predicts the error of the simulation model 208 by inputting thefeature vector (which is based on the simulation statistics) to thetrained error model 222, which outputs a predicted error. In block 326,the computing device 100 may adjust the output of the simulation model208 based on the predicted error. The computing device 100 may, forexample, adjust a previously output value and/or adapt the execution ofthe simulation model 208 based on the predicted error.

Simulation error correction as illustrated in block 318 may be performedin an offline mode, an online mode, or a hybrid mode. Offline simulationerror correction is performed after completion of a simulation run anduses an error model 222 that was trained in the offline mode. Onepotential embodiment of a method for offline simulation error correctionis described below in connection with FIG. 6. Hybrid simulation errorcorrection is performed during a simulation run but uses an error model222 that was trained in the offline mode. Online simulation errorcorrection is performed during a simulation run and uses an error model222 that was trained in the online mode. One potential embodiment of amethod for hybrid/online simulation error correction is described belowin connection with FIG. 7. After correcting the simulation error usingthe error model 222, the method 300 is completed. The computing device100 may execute the method 300 again, for example to perform additionaltraining and correction.

Although illustrated as performing training and error correction usingseparate training programs 202 and test program 204, in some embodimentsthe computing device 100 may perform training and correction with thesame program. For example, the computing device 100 may start simulationof a program in the online training mode as described above inconnection with block 302. When the error model 222 reaches a certainaccuracy threshold, the computing device 100 may switch simulation ofthe same program to the online error correction mode as described abovein connection with block 318. If accuracy of the error model 222 dropsbelow the threshold, the computing device 100 may switch back to theonline training mode, and so on.

Referring now to FIG. 4, in use, the computing device 100 may execute amethod 400 for offline error model training. It should be appreciatedthat, in some embodiments, the operations of the method 400 may beperformed by one or more components of the environment 200 of thecomputing device 100 as shown in FIG. 2. The method 400 begins in block402, in which the computing device 100 simulates performance of aprocessor architecture using the simulation model 208 and stores outputof the simulation. The computing device 100 may use the simulation model208 to simulate one of the training programs 202.

In block 404, after completion of the simulation run, the computingdevice 100 captures simulation statistics of the simulation model 208 asa feature vector for the error model 222. The simulation statistics mayinclude any simulated processor event or other statistics generated bythe simulation model 208 and/or its various subcomponents and availableafter completion of the simulation run. For example, the simulationstatistics may include floating point unit occupancy, L2 cache snooplatencies, branch prediction accuracy, or other statistics generated bythe simulation model 208 and stored in the results of the simulation.Internal state of the simulation model 208 may not be available foroffline training, for example due to storage space constraints. Thecomputing device 100 may normalize or otherwise pre-process thesimulation statistics as described above in connection with block 308 ofFIG. 3. In some embodiments, in block 406, the computing device 100 mayread one or more performance counters established by the simulationmodel 208. For example, the computing device 100 may read a number ofcache misses, instructions executed, or other counter maintained by thesimulation model 208.

In block 408, the computing device 100 collects ground truth performancestatistics for the training program 202. In some embodiments, in block410 the computing device 100 may run the cycle-accurate simulator 212 onthe training program 202 and then collect data from one more performancecounters established by the cycle-accurate simulator 212. In someembodiments, the computing device 100 may collect cycle-accuratesimulation results from a pre-existing simulation results database 214.Re-using cycle-accurate simulation results may result in substantialreductions in simulation time. In some embodiments, in block 412 thecomputing device 100 may collect performance counter data from one ormore physical hardware components. For example, when simulation anexisting processor architecture, the computing device 100 may executethe training program 202 with the processor 120 and collect ground truthdata from performance counters of the processor 120. As another example,the computing device 100 may collect ground truth data generated byhardware components of another computing device (e.g., a prototypedevice or other test device).

In block 414, the computing device 100 stores the feature vector and theground truth performance statistic as a training sample. In block 416,the computing device 100 determines whether to collect additionaltraining samples. For example, the computing device 100 may determinewhether additional training programs 202 remain to be executed. If thecomputing device 100 determines to collect additional samples, themethod 400 loops back to block 402. If the computing device 100determines not to collect any additional samples, the method 400advances to block 418.

In block 418, the computing device 100 trains the error model 222 usingthe stored training samples. The computing device 100 trains the errormodel 222 to predict the error in the performance statistic generated bythe simulation model 208 as compared to the ground truth performancestatistic, as a function of the feature vector (which is generated fromthe simulation statistics). As described above, the computing device 100may use any appropriate machine learning algorithm to train the errormodel 222, such as stochastic gradient descent (SGD). The computingdevice 100 may train the error model 222 to a predetermined confidencelevel, such training with a 90% confidence interval. The computingdevice 100 may also optimize the training algorithm and/or the storedtraining samples to improve performance of the error model 222. In someembodiments, in block 420 the computing device 100 may perform ahyperparameter search to improve training algorithm performance In someembodiments, in block 422 the computing device 100 may improve errormodel 222 performance by performing nested cross-validation.

After training the error model 222, the method 400 is completed. Thecomputing device 100 may then use the trained error model 222 to correctsimulation error in an offline mode, as described further below inconnection with FIG. 6 and/or to correct simulation error in a hybridmode, as described further below in connection with FIG. 7.

Referring now to FIG. 5, in use, the computing device 100 may execute amethod 500 for online error model training. It should be appreciatedthat, in some embodiments, the operations of the method 500 may beperformed by one or more components of the environment 200 of thecomputing device 100 as shown in FIG. 2. The method 500 begins in block502, in which the computing device 100 simulates performance of aprocessor architecture using the simulation model 208 for a simulationtime interval of one of the training programs 202. For example, thecomputing device 100 may simulate a predetermined number ofinstructions, clock cycles, or other simulation interval of the trainingprogram 202.

In block 504, the computing device 100 captures simulation statistics ofthe simulation model 208 for the simulation interval as a feature vectorfor the error model 222. The simulation statistics may include anysimulated processor event or other statistics generated by thesimulation model 208 and/or its various subcomponents and availableduring the simulation run. In some embodiments, in block 506, thecomputing device 100 may collect the internal simulator state of thesimulation model 208. For example, the computing device 100 readpipeline stage events (pipe-traces) from the simulation model 208. Ofcourse, the computing device 100 may also collect externally availableperformance statistics, such as performance counters. The computingdevice 100 may normalize or otherwise pre-process the simulationstatistics as described above in connection with block 308 of FIG. 3.

In block 508, the computing device 100 collects ground truth performancestatistics for the training program 202. In some embodiments, in block510 the computing device 100 may run the cycle-accurate simulator 212for the same interval of the training program 202 that was simulated bythe simulation model 208. For example, the computing device 100 may usethe cycle-accurate simulator 212 to simulate performance of the sameinstruction, clock cycle, or other simulation interval that wassimulated by the simulation model 208.

In block 512, the computing device 100 trains the error model 222 usingthe feature vector and the ground truth data. The computing device 100trains the error model 222 to predict the error in the performancestatistic generated by the simulation model 208 as compared to theground truth performance statistic, as a function of the feature vector(which is generated from the simulation statistics). As described above,the computing device 100 may use any appropriate machine learningalgorithm to train the error model 222, such as stochastic gradientdescent (SGD). Note that because the feature vector and ground truthdata differ between the offline and online modes, the trained errormodel 222 generated in each mode may also differ.

In block 514, the computing device 100 determines whether to continuetraining the error model 222. For example, the computing device 100 maydetermine whether additional instructions remain in the current trainingprogram 202 and/or whether additional training programs 202 exist. Ifthe computing device 100 determines to continue training, the method 500loops back to block 502 to simulate another simulation interval. If thecomputing device 100 determines not to continue training, the method 500is completed. The computing device 100 may then use the trained errormodel 222 to correct simulation error in the online mode, as describedfurther below in connection with FIG. 7.

Referring now to FIG. 6, in use, the computing device 100 may execute amethod 600 for offline simulation error correction. It should beappreciated that, in some embodiments, the operations of the method 600may be performed by one or more components of the environment 200 of thecomputing device 100 as shown in FIG. 2. The method 600 begins in block602, in which the computing device 100 simulates performance of aprocessor architecture using the simulation model 208 and stores outputof the simulation. The computing device 100 may use the simulation model208 to simulate the test program 204.

In block 604, after completion of the simulation run, the computingdevice 100 captures simulation statistics of the simulation model 208 asa feature vector for the error model 222. As described above, thesimulation statistics may include any simulated processor event or otherstatistics generated by the simulation model 208 and/or its varioussubcomponents and available after completion of the simulation run. Thecomputing device 100 may normalize or otherwise pre-process thesimulation statistics as described above in connection with block 322 ofFIG. 3. In some embodiments, in block 606, the computing device 100 mayread one or more performance counters established by the simulationmodel 208. For example, the computing device 100 may read a number ofcache misses, instructions executed, or other counter maintained by thesimulation model 208.

In block 608, the computing device 100 predicts the error of thesimulation model 208 by inputting the feature vector (which is based onthe simulation statistics) to the error model 222, which outputs apredicted error. In block 610, the computing device 100 adjust theoutput of the simulation model 208 based on the predicted error. Thecomputing device 100 may adjust a performance statistic generated by thesimulation model 208 (e.g., CPI) by the predicted error generated by theerror model 222. In some embodiments, in block 612 the computing device100 may present the adjusted output and an associated confidenceindication. The confidence level may be determined during the trainingphase of the error model 222. For example, in an illustrative embodimentthe simulation model 208 may determine an instructions per cycle (IPC)value for the test program 204, which is illustratively the numericvalue 0.4. Continuing that example, the error model 222 may bepre-trained with a 90% confidence interval. The pre-trained error model222 may predict an IPC error of −0.1 based on the simulation statisticsfrom the simulation model 208. Thus, in that example, the computingdevice 100 may present a simulated IPC of 0.4 together with a90%-accurate error corrected IPC of 0.3. After adjusting the simulationoutput, the method 600 is completed.

Referring now to FIG. 7, in use, the computing device 100 may execute amethod 700 for hybrid/online simulation error correction. It should beappreciated that, in some embodiments, the operations of the method 700may be performed by one or more components of the environment 200 of thecomputing device 100 as shown in FIG. 2. The method 700 begins in block702, in which the computing device 100 in which the computing device 100simulates performance of a processor architecture using the simulationmodel 208 for a simulation time interval of the test programs 204. Forexample, the computing device 100 may simulate a predetermined number ofinstructions, clock cycles, or other simulation interval.

In block 704, the computing device 100 captures simulation statistics ofthe simulation model 208 as a feature vector for the error model 222.The simulation statistics may include any simulated processor event orother statistics generated by the simulation model 208 and/or itsvarious subcomponents and available during the simulation run. Thecomputing device 100 may normalize or otherwise pre-process thesimulation statistics as described above in connection with block 322 ofFIG. 3. In some embodiments, in block 706, the computing device 100 mayread one or more performance counters established by the simulationmodel 208. For example, the computing device 100 may read a number ofcache misses, instructions executed, or other counter maintained by thesimulation model 208. The computing device 100 may read the performancecounter when operating in the hybrid error correction mode, using anerror model 222 that was trained in the offline mode as described abovein connection with FIG. 4. In some embodiments, in block 708, thecomputing device 100 may collect the internal simulator state of thesimulation model 208. For example, the computing device 100 readpipeline stage events (pipe-traces) from the simulation model 208. Thecomputing device 100 may collect the internal state when operating inthe online error correction mode, using an error model 222 that wastrained in the online mode as described above in connection with FIG. 5.

In block 710, the computing device 100 predicts the error of thesimulation model 208 by inputting the feature vector (which is based onthe simulation statistics) to the error model 222, which outputs apredicted error. In block 712, the computing device 100 adapts theexecution of the simulation model 208 based on the predicted error. Thecomputing device 100 may adjust, during simulation, one or moresimulation parameters to correct a performance statistic (e.g., CPI)generated by the simulation model 208 based on the predicted error.Thus, the error predicted by the error model 222 may be used as feedbackto improve the accuracy of the simulation model 208. In someembodiments, in block 714 the computing device 100 may gradually correctone or more parameters of the simulation model 208 based on thepredicted error. In some embodiments, in block 716 the computing device100 may adjust a time parameter of the simulation model 208, such as asimulated clock interval. For example, the error model 222 may predictan instructions per cycle (IPC) error of +0.1. To adapt to the predictedIPC error, the computing device 100 may turn back the simulation time bya small amount (e.g., a few nanoseconds). However, in some embodiments,it may not be possible to turn back simulation time of the simulationmodel 208. Thus, the computing device 100 may adjust the simulated clockincrement used by the simulation model 208 by a small amount togradually remove the predicted error. Note that the simulation model 208may use a simulated clock interval or other time interval that isdifferent from the simulation time interval used by the error model 222.

In block 718, the computing device 100 determines whether to continuesimulation. For example, the computing device 100 may determine whetheradditional instructions remain in the test program 204. If so, themethod 700 loops back to block 702 to continue simulating performance ofthe processor. If the computing device 100 determines not to continuesimulation, the method 700 is completed.

It should be appreciated that, in some embodiments, the methods 300,400, 500, 600, and/or 700 may be embodied as various instructions storedon a computer-readable media, which may be executed by the processor120, the I/O subsystem 122, and/or other components of a computingdevice 100 to cause the computing device 100 to perform the respectivemethod 300, 400, 500, 600, and/or 700. The computer-readable media maybe embodied as any type of media capable of being read by the computingdevice 100 including, but not limited to, the memory 124, the datastorage device 126, firmware devices, and/or other media.

EXAMPLES

Illustrative examples of the technologies disclosed herein are providedbelow. An embodiment of the technologies may include any one or more,and any combination of, the examples described below.

Example 1 includes a computing device for processor performancesimulation, the computing device comprising: a performance simulator tosimulate performance of a processor for a training program with asimulation model to determine a training performance statistic; a groundtruth manager to collect a ground truth performance statistic of theprocessor for the training program; and an error model trainer to (i)capture training simulation statistics from the simulation model for thetraining program in response to simulation of the performance of theprocessor, (ii) train an error model with the training simulationstatistics and the ground truth performance statistic, wherein errormodel comprises a regression model to model an error of the performancestatistic generated by the simulation model compared to the ground truthperformance statistic, and wherein the training simulation statisticscomprise a feature vector for the error model.

Example 2 includes the subject matter of Example 1, and wherein tosimulate the performance of the processor comprises to execute anapplication-level processor architecture performance simulator.

Example 3 includes the subject matter of any of Examples 1 and 2, andwherein the training performance statistic comprises a cycles perinstruction value, a floating point operations per second value, a powerconsumption value, or a memory bandwidth value.

Example 4 includes the subject matter of any of Examples 1-3, andwherein the error model comprises an artificial neural network.

Example 5 includes the subject matter of any of Examples 1-4, andwherein the error model comprises a linear regression model.

Example 6 includes the subject matter of any of Examples 1-5, andwherein to capture the training simulation statistics comprises tonormalize an aggregated performance measurement by execution time.

Example 7 includes the subject matter of any of Examples 1-6, andwherein the training simulation statistics are indicative of one or moresimulated processor events generated by the simulation model.

Example 8 includes the subject matter of any of Examples 1-7, andfurther comprising an error corrector, wherein: the performancesimulator is further to simulate performance of the processor for a testprogram with the simulation model to determine a test performancestatistic; and the error corrector is to (i) capture test simulationstatistics from the simulation model for the test program in response tosimulation of the performance of the processor, (ii) predict a predictederror of the simulation model using the error model with the testsimulation statistics as a feature vector in response to training of theerror model, and (iii) adjust the test performance statistic based onthe predicted error.

Example 9 includes the subject matter of any of Examples 1-8, andwherein: the performance simulator is further to (i) complete simulationof the performance of the processor for the training program, and (ii)store the training simulation statistics and the training performancestatistics in response to completion of the simulation; and to capturethe training simulation statistics comprises to capture the trainingsimulation statistics in response to the completion of the simulation ofthe performance of the processor.

Example 10 includes the subject matter of any of Examples 1-9, andwherein to capture the training simulation statistics comprises to reada performance counter of the simulation model.

Example 11 includes the subject matter of any of Examples 1-10, andwherein to collect the ground truth performance statistic comprises toexecute a cycle-accurate simulation of the training program.

Example 12 includes the subject matter of any of Examples 1-11, andwherein to collect the ground truth performance statistic comprises toread a predetermined database of cycle-accurate simulation results.

Example 13 includes the subject matter of any of Examples 1-12, andwherein to collect the ground truth performance statistic comprises toread a performance counter of a hardware processor.

Example 14 includes the subject matter of any of Examples 1-13, andfurther comprising an error corrector, wherein: the performancesimulator is further to (i) simulate performance of the processor for atest program with the simulation model to determine a test performancestatistic and (ii) complete simulation of the performance of theprocessor for the test program; and the error corrector is to (i)capture test simulation statistics from the simulation model for thetest program in response to completion of the simulation of theperformance of the processor, (ii) predict a predicted error of thesimulation model using the error model with the test simulationstatistics as a feature vector in response to training of the errormodel and in response to the completion of the simulation of theperformance of the processor for the test program, and (iii) adjust thetest performance statistic based on the predicted error.

Example 15 includes the subject matter of any of Examples 1-14, andfurther comprising an error corrector, wherein: the performancesimulator is further to simulate performance of the processor for a timeinterval of a test program with the simulation model to determine a testperformance statistic; and the error corrector is to (i) capture testsimulation statistics from the simulation model for the time interval ofthe test program in response to simulation of the performance of theprocessor, (ii) predict a predicted error of the simulation model usingthe error model with the test simulation statistics as a feature vectorin response to capture of the test simulation statistics and training ofthe error model, and (iii) adapt the simulation model based on thepredicted error.

Example 16 includes the subject matter of any of Examples 1-15, andwherein: to simulate the performance of the processor for the trainingprogram comprises to simulate performance of the processor for a timeinterval of the training program; to capture the training simulationstatistics comprises to capture the training simulation statistics fromthe simulation model for the time interval; to collect the ground truthperformance statistic comprises to collect the ground truth performancestatistic for the time interval of the training program; and to trainthe error model comprises to train the error model in response tosimulation of the performance of the processor for the time interval.

Example 17 includes the subject matter of any of Examples 1-16, andwherein to capture the training simulation statistics comprises tocapture an internal simulator state of the simulation model.

Example 18 includes the subject matter of any of Examples 1-17, andwherein to collect the ground truth performance statistic comprises toexecute a cycle-accurate simulation of the time interval of the trainingprogram.

Example 19 includes the subject matter of any of Examples 1-18, andfurther comprising an error corrector, wherein: the performancesimulator is further to (i) simulate performance of the processor for atime interval of a test program with the simulation model to determine atest performance statistic; and the error corrector is to (i) capturetest simulation statistics from the simulation model for the timeinterval of the test program in response to simulation of theperformance of the processor, (ii) predict a predicted error of thesimulation model using the error model with the test simulationstatistics as a feature vector in response to capture of the testsimulation statistics, and (iii) adapt the simulation model based on thepredicted error.

Example 20 includes the subject matter of any of Examples 1-19, andwherein to adapt the simulation model comprises to gradually correct aparameter of the simulation model based on the predicted error.

Example 21 includes the subject matter of any of Examples 1-20, andwherein to adapt the simulation model comprises to adjust a simulationinterval of the simulation model based on the predicted error.

Example 22 includes a method for processor performance simulation, themethod comprising: simulating, by a computing device, performance of aprocessor for a training program with a simulation model to determine atraining performance statistic; capturing, by the computing device,training simulation statistics from the simulation model for thetraining program in response to simulating the performance of theprocessor; collecting, by the computing device, a ground truthperformance statistic of the processor for the training program; andtraining, by the computing device, an error model with the trainingsimulation statistics and the ground truth performance statistic,wherein error model comprises a regression model to model an error ofthe performance statistic generated by the simulation model compared tothe ground truth performance statistic, and wherein the trainingsimulation statistics comprise a feature vector for the error model.

Example 23 includes the subject matter of Example 22, and whereinsimulating the performance of the processor comprises executing anapplication-level processor architecture performance simulator.

Example 24 includes the subject matter of any of Examples 22 and 23, andwherein the training performance statistic comprises a cycles perinstruction value, a floating point operations per second value, a powerconsumption value, or a memory bandwidth value.

Example 25 includes the subject matter of any of Examples 22-24, andwherein the error model comprises an artificial neural network.

Example 26 includes the subject matter of any of Examples 22-25, andwherein the error model comprises a linear regression model.

Example 27 includes the subject matter of any of Examples 22-26, andwherein capturing the training simulation statistics comprisesnormalizing an aggregated performance measurement by execution time.

Example 28 includes the subject matter of any of Examples 22-27, andwherein the training simulation statistics are indicative of one or moresimulated processor events generated by the simulation model.

Example 29 includes the subject matter of any of Examples 22-28, andfurther comprising: simulating, by the computing device, performance ofthe processor for a test program with the simulation model to determinea test performance statistic; capturing, by the computing device, testsimulation statistics from the simulation model for the test program inresponse to simulating the performance of the processor; predicting, bythe computing device, a predicted error of the simulation model usingthe error model with the test simulation statistics as a feature vectorin response to training the error model; and adjusting, by the computingdevice, the test performance statistic based on the predicted error.

Example 30 includes the subject matter of any of Examples 22-29, andfurther comprising: completing, by the computing device, simulation ofthe performance of the processor for the training program; and storing,by the computing device, the training simulation statistics and thetraining performance statistics in response to completing thesimulation; wherein capturing the training simulation statisticscomprises capturing the training simulation statistics in response tocompleting the simulation of the performance of the processor.

Example 31 includes the subject matter of any of Examples 22-30, andwherein capturing the training simulation statistics comprises reading aperformance counter of the simulation model.

Example 32 includes the subject matter of any of Examples 22-31, andwherein collecting the ground truth performance statistic comprisesexecuting a cycle-accurate simulation of the training program.

Example 33 includes the subject matter of any of Examples 22-32, andwherein collecting the ground truth performance statistic comprisesreading a predetermined database of cycle-accurate simulation results.

Example 34 includes the subject matter of any of Examples 22-33, andwherein collecting the ground truth performance statistic comprisesreading a performance counter of a hardware processor.

Example 35 includes the subject matter of any of Examples 22-34, andfurther comprising: simulating, by the computing device, performance ofthe processor for a test program with the simulation model to determinea test performance statistic; completing, by the computing device,simulation of the performance of the processor for the test program;capturing, by the computing device, test simulation statistics from thesimulation model for the test program in response to completingsimulation of the performance of the processor; predicting, by thecomputing device, a predicted error of the simulation model using theerror model with the test simulation statistics as a feature vector inresponse to training the error model and in response to completing thesimulation of the performance of the processor for the test program; andadjusting, by the computing device, the test performance statistic basedon the predicted error.

Example 36 includes the subject matter of any of Examples 22-35, andfurther comprising: simulating, by the computing device, performance ofthe processor for a time interval of a test program with the simulationmodel to determine a test performance statistic; capturing, by thecomputing device, test simulation statistics from the simulation modelfor the time interval of the test program in response to simulating theperformance of the processor; predicting, by the computing device, apredicted error of the simulation model using the error model with thetest simulation statistics as a feature vector in response to capturingthe test simulation statistics and training the error model; andadapting, by the computing device, the simulation model based on thepredicted error.

Example 37 includes the subject matter of any of Examples 22-36, andwherein: simulating the performance of the processor for the trainingprogram comprises simulating performance of the processor for a timeinterval of the training program; capturing the training simulationstatistics comprises capturing the training simulation statistics fromthe simulation model for the time interval; collecting the ground truthperformance statistic comprises collecting the ground truth performancestatistic for the time interval of the training program; and trainingthe error model comprises training the error model in response tosimulating the performance of the processor for the time interval.

Example 38 includes the subject matter of any of Examples 22-37, andwherein capturing the training simulation statistics comprises capturingan internal simulator state of the simulation model.

Example 39 includes the subject matter of any of Examples 22-38, andwherein collecting the ground truth performance statistic comprisesexecuting a cycle-accurate simulation of the time interval of thetraining program.

Example 40 includes the subject matter of any of Examples 22-39, andfurther comprising: simulating, by the computing device, performance ofthe processor for a time interval of a test program with the simulationmodel to determine a test performance statistic; capturing, by thecomputing device, test simulation statistics from the simulation modelfor the time interval of the test program in response to simulating theperformance of the processor; predicting, by the computing device, apredicted error of the simulation model using the error model with thetest simulation statistics as a feature vector in response to capturingthe test simulation statistics; and adapting, by the computing device,the simulation model based on the predicted error.

Example 41 includes the subject matter of any of Examples 22-40, andwherein adapting the simulation model comprises gradually correcting aparameter of the simulation model based on the predicted error.

Example 42 includes the subject matter of any of Examples 22-41, andwherein adapting the simulation model comprises adjusting a simulationinterval of the simulation model based on the predicted error.

Example 43 includes a computing device comprising: a processor; and amemory having stored therein a plurality of instructions that whenexecuted by the processor cause the computing device to perform themethod of any of Examples 22-42.

Example 44 includes one or more machine readable storage mediacomprising a plurality of instructions stored thereon that in responseto being executed result in a computing device performing the method ofany of Examples 22-42.

Example 45 includes a computing device comprising means for performingthe method of any of Examples 22-42.

Example 46 includes a computing device for processor performancesimulation, the computing device comprising: means for simulatingperformance of a processor for a training program with a simulationmodel to determine a training performance statistic; means for capturingtraining simulation statistics from the simulation model for thetraining program in response to simulating the performance of theprocessor; means for collecting a ground truth performance statistic ofthe processor for the training program; and means for training an errormodel with the training simulation statistics and the ground truthperformance statistic, wherein error model comprises a regression modelto model an error of the performance statistic generated by thesimulation model compared to the ground truth performance statistic, andwherein the training simulation statistics comprise a feature vector forthe error model.

Example 47 includes the subject matter of Example 46, and wherein themeans for simulating the performance of the processor comprises meansfor executing an application-level processor architecture performancesimulator.

Example 48 includes the subject matter of any of Examples 46 and 47, andwherein the training performance statistic comprises a cycles perinstruction value, a floating point operations per second value, a powerconsumption value, or a memory bandwidth value.

Example 49 includes the subject matter of any of Examples 46-48, andwherein the error model comprises an artificial neural network.

Example 50 includes the subject matter of any of Examples 46-49, andwherein the error model comprises a linear regression model.

Example 51 includes the subject matter of any of Examples 46-50, andwherein the means for capturing the training simulation statisticscomprises means for normalizing an aggregated performance measurement byexecution time.

Example 52 includes the subject matter of any of Examples 46-51, andwherein the training simulation statistics are indicative of one or moresimulated processor events generated by the simulation model.

Example 53 includes the subject matter of any of Examples 46-52, andfurther comprising: means for simulating performance of the processorfor a test program with the simulation model to determine a testperformance statistic; means for capturing test simulation statisticsfrom the simulation model for the test program in response to simulatingthe performance of the processor; means for predicting a predicted errorof the simulation model using the error model with the test simulationstatistics as a feature vector in response to training the error model;and means for adjusting the test performance statistic based on thepredicted error.

Example 54 includes the subject matter of any of Examples 46-53, andfurther comprising: means for completing simulation of the performanceof the processor for the training program; and means for storing thetraining simulation statistics and the training performance statisticsin response to completing the simulation; wherein the means forcapturing the training simulation statistics comprises means forcapturing the training simulation statistics in response to completingthe simulation of the performance of the processor.

Example 55 includes the subject matter of any of Examples 46-54, andwherein the means for capturing the training simulation statisticscomprises means for reading a performance counter of the simulationmodel.

Example 56 includes the subject matter of any of Examples 46-55, andwherein the means for collecting the ground truth performance statisticcomprises means for executing a cycle-accurate simulation of thetraining program.

Example 57 includes the subject matter of any of Examples 46-56, andwherein the means for collecting the ground truth performance statisticcomprises means for reading a predetermined database of cycle-accuratesimulation results.

Example 58 includes the subject matter of any of Examples 46-57, andwherein the means for collecting the ground truth performance statisticcomprises means for reading a performance counter of a hardwareprocessor.

Example 59 includes the subject matter of any of Examples 46-58, andfurther comprising: means for simulating performance of the processorfor a test program with the simulation model to determine a testperformance statistic; means for completing simulation of theperformance of the processor for the test program; means for capturingtest simulation statistics from the simulation model for the testprogram in response to completing simulation of the performance of theprocessor; means for predicting a predicted error of the simulationmodel using the error model with the test simulation statistics as afeature vector in response to training the error model and in responseto completing the simulation of the performance of the processor for thetest program; and means for adjusting the test performance statisticbased on the predicted error.

Example 60 includes the subject matter of any of Examples 46-59, andfurther comprising: means for simulating performance of the processorfor a time interval of a test program with the simulation model todetermine a test performance statistic; means for capturing testsimulation statistics from the simulation model for the time interval ofthe test program in response to simulating the performance of theprocessor; means for predicting a predicted error of the simulationmodel using the error model with the test simulation statistics as afeature vector in response to capturing the test simulation statisticsand training the error model; and means for adapting the simulationmodel based on the predicted error.

Example 61 includes the subject matter of any of Examples 46-60, andwherein: the means for simulating the performance of the processor forthe training program comprises means for simulating performance of theprocessor for a time interval of the training program; the means forcapturing the training simulation statistics comprises means forcapturing the training simulation statistics from the simulation modelfor the time interval; the means for collecting the ground truthperformance statistic comprises means for collecting the ground truthperformance statistic for the time interval of the training program; andthe means for training the error model comprises means for training theerror model in response to simulating the performance of the processorfor the time interval.

Example 62 includes the subject matter of any of Examples 46-61, andwherein the means for capturing the training simulation statisticscomprises means for capturing an internal simulator state of thesimulation model.

Example 63 includes the subject matter of any of Examples 46-62, andwherein the means for collecting the ground truth performance statisticcomprises means for executing a cycle-accurate simulation of the timeinterval of the training program.

Example 64 includes the subject matter of any of Examples 46-63, andfurther comprising: means for simulating performance of the processorfor a time interval of a test program with the simulation model todetermine a test performance statistic; means for capturing testsimulation statistics from the simulation model for the time interval ofthe test program in response to simulating the performance of theprocessor; means for predicting a predicted error of the simulationmodel using the error model with the test simulation statistics as afeature vector in response to capturing the test simulation statistics;and means for adapting the simulation model based on the predictederror.

Example 65 includes the subject matter of any of Examples 46-64, andwherein the means for adapting the simulation model comprises graduallymeans for correcting a parameter of the simulation model based on thepredicted error.

Example 66 includes the subject matter of any of Examples 46-65, andwherein the means for adapting the simulation model comprises means foradjusting a simulation interval of the simulation model based on thepredicted error.

1. A computing device for processor performance simulation, thecomputing device comprising: a performance simulator to simulateperformance of a processor for a training program with a simulationmodel to determine a training performance statistic; a ground truthmanager to collect a ground truth performance statistic of the processorfor the training program; and an error model trainer to (i) capturetraining simulation statistics from the simulation model for thetraining program in response to simulation of the performance of theprocessor, (ii) train an error model with the training simulationstatistics and the ground truth performance statistic, wherein errormodel comprises a regression model to model an error of the performancestatistic generated by the simulation model compared to the ground truthperformance statistic, and wherein the training simulation statisticscomprise a feature vector for the error model.
 2. The computing deviceof claim 1, wherein to simulate the performance of the processorcomprises to execute an application-level processor architectureperformance simulator.
 3. The computing device of claim 1, wherein thetraining simulation statistics are indicative of one or more simulatedprocessor events generated by the simulation model.
 4. The computingdevice of claim 1, further comprising an error corrector, wherein: theperformance simulator is further to simulate performance of theprocessor for a test program with the simulation model to determine atest performance statistic; and the error corrector is to (i) capturetest simulation statistics from the simulation model for the testprogram in response to simulation of the performance of the processor,(ii) predict a predicted error of the simulation model using the errormodel with the test simulation statistics as a feature vector inresponse to training of the error model, and (iii) adjust the testperformance statistic based on the predicted error.
 5. The computingdevice of claim 1, wherein: the performance simulator is further to (i)complete simulation of the performance of the processor for the trainingprogram, and (ii) store the training simulation statistics and thetraining performance statistics in response to completion of thesimulation; and to capture the training simulation statistics comprisesto capture the training simulation statistics in response to thecompletion of the simulation of the performance of the processor.
 6. Thecomputing device of claim 5, further comprising an error corrector,wherein: the performance simulator is further to (i) simulateperformance of the processor for a test program with the simulationmodel to determine a test performance statistic and (ii) completesimulation of the performance of the processor for the test program; andthe error corrector is to (i) capture test simulation statistics fromthe simulation model for the test program in response to completion ofthe simulation of the performance of the processor, (ii) predict apredicted error of the simulation model using the error model with thetest simulation statistics as a feature vector in response to trainingof the error model and in response to the completion of the simulationof the performance of the processor for the test program, and (iii)adjust the test performance statistic based on the predicted error. 7.The computing device of claim 5, further comprising an error corrector,wherein: the performance simulator is further to simulate performance ofthe processor for a time interval of a test program with the simulationmodel to determine a test performance statistic; and the error correctoris to (i) capture test simulation statistics from the simulation modelfor the time interval of the test program in response to simulation ofthe performance of the processor, (ii) predict a predicted error of thesimulation model using the error model with the test simulationstatistics as a feature vector in response to capture of the testsimulation statistics and training of the error model, and (iii) adaptthe simulation model based on the predicted error.
 8. The computingdevice of claim 1, wherein: to simulate the performance of the processorfor the training program comprises to simulate performance of theprocessor for a time interval of the training program; to capture thetraining simulation statistics comprises to capture the trainingsimulation statistics from the simulation model for the time interval;to collect the ground truth performance statistic comprises to collectthe ground truth performance statistic for the time interval of thetraining program; and to train the error model comprises to train theerror model in response to simulation of the performance of theprocessor for the time interval.
 9. The computing device of claim 8,wherein to capture the training simulation statistics comprises tocapture an internal simulator state of the simulation model.
 10. Thecomputing device of claim 8, further comprising an error corrector,wherein: the performance simulator is further to (i) simulateperformance of the processor for a time interval of a test program withthe simulation model to determine a test performance statistic; and theerror corrector is to (i) capture test simulation statistics from thesimulation model for the time interval of the test program in responseto simulation of the performance of the processor, (ii) predict apredicted error of the simulation model using the error model with thetest simulation statistics as a feature vector in response to capture ofthe test simulation statistics, and (iii) adapt the simulation modelbased on the predicted error.
 11. The computing device of claim 10,wherein to adapt the simulation model comprises to gradually correct aparameter of the simulation model based on the predicted error.
 12. Amethod for processor performance simulation, the method comprising:simulating, by a computing device, performance of a processor for atraining program with a simulation model to determine a trainingperformance statistic; capturing, by the computing device, trainingsimulation statistics from the simulation model for the training programin response to simulating the performance of the processor; collecting,by the computing device, a ground truth performance statistic of theprocessor for the training program; and training, by the computingdevice, an error model with the training simulation statistics and theground truth performance statistic, wherein error model comprises aregression model to model an error of the performance statisticgenerated by the simulation model compared to the ground truthperformance statistic, and wherein the training simulation statisticscomprise a feature vector for the error model.
 13. The method of claim12, further comprising: simulating, by the computing device, performanceof the processor for a test program with the simulation model todetermine a test performance statistic; capturing, by the computingdevice, test simulation statistics from the simulation model for thetest program in response to simulating the performance of the processor;predicting, by the computing device, a predicted error of the simulationmodel using the error model with the test simulation statistics as afeature vector in response to training the error model; and adjusting,by the computing device, the test performance statistic based on thepredicted error.
 14. The method of claim 12, further comprising:completing, by the computing device, simulation of the performance ofthe processor for the training program; and storing, by the computingdevice, the training simulation statistics and the training performancestatistics in response to completing the simulation; wherein capturingthe training simulation statistics comprises capturing the trainingsimulation statistics in response to completing the simulation of theperformance of the processor.
 15. The method of claim 14, furthercomprising: simulating, by the computing device, performance of theprocessor for a test program with the simulation model to determine atest performance statistic; completing, by the computing device,simulation of the performance of the processor for the test program;capturing, by the computing device, test simulation statistics from thesimulation model for the test program in response to completingsimulation of the performance of the processor; predicting, by thecomputing device, a predicted error of the simulation model using theerror model with the test simulation statistics as a feature vector inresponse to training the error model and in response to completing thesimulation of the performance of the processor for the test program; andadjusting, by the computing device, the test performance statistic basedon the predicted error.
 16. The method of claim 14, further comprising:simulating, by the computing device, performance of the processor for atime interval of a test program with the simulation model to determine atest performance statistic; capturing, by the computing device, testsimulation statistics from the simulation model for the time interval ofthe test program in response to simulating the performance of theprocessor; predicting, by the computing device, a predicted error of thesimulation model using the error model with the test simulationstatistics as a feature vector in response to capturing the testsimulation statistics and training the error model; and adapting, by thecomputing device, the simulation model based on the predicted error. 17.The method of claim 12, wherein: simulating the performance of theprocessor for the training program comprises simulating performance ofthe processor for a time interval of the training program; capturing thetraining simulation statistics comprises capturing the trainingsimulation statistics from the simulation model for the time interval;collecting the ground truth performance statistic comprises collectingthe ground truth performance statistic for the time interval of thetraining program; and training the error model comprises training theerror model in response to simulating the performance of the processorfor the time interval.
 18. The method of claim 17, further comprising:simulating, by the computing device, performance of the processor for atime interval of a test program with the simulation model to determine atest performance statistic; capturing, by the computing device, testsimulation statistics from the simulation model for the time interval ofthe test program in response to simulating the performance of theprocessor; predicting, by the computing device, a predicted error of thesimulation model using the error model with the test simulationstatistics as a feature vector in response to capturing the testsimulation statistics; and adapting, by the computing device, thesimulation model based on the predicted error.
 19. One or morecomputer-readable storage media comprising a plurality of instructionsthat in response to being executed cause a computing device to: simulateperformance of a processor for a training program with a simulationmodel to determine a training performance statistic; capture trainingsimulation statistics from the simulation model for the training programin response to simulating the performance of the processor; collect aground truth performance statistic of the processor for the trainingprogram; and train an error model with the training simulationstatistics and the ground truth performance statistic, wherein errormodel comprises a regression model to model an error of the performancestatistic generated by the simulation model compared to the ground truthperformance statistic, and wherein the training simulation statisticscomprise a feature vector for the error model.
 20. The one or morecomputer-readable storage media of claim 19, further comprising aplurality of instructions that in response to being executed cause thecomputing device to: simulate performance of the processor for a testprogram with the simulation model to determine a test performancestatistic; capture test simulation statistics from the simulation modelfor the test program in response to simulating the performance of theprocessor; predict a predicted error of the simulation model using theerror model with the test simulation statistics as a feature vector inresponse to training the error model; and adjust the test performancestatistic based on the predicted error.
 21. The one or morecomputer-readable storage media of claim 19, further comprising aplurality of instructions that in response to being executed cause thecomputing device to: complete simulation of the performance of theprocessor for the training program; and store the training simulationstatistics and the training performance statistics in response tocompleting the simulation; wherein to capture the training simulationstatistics comprises to capture the training simulation statistics inresponse to completing the simulation of the performance of theprocessor.
 22. The one or more computer-readable storage media of claim21, further comprising a plurality of instructions that in response tobeing executed cause the computing device to: simulate performance ofthe processor for a test program with the simulation model to determinea test performance statistic; complete simulation of the performance ofthe processor for the test program; capture test simulation statisticsfrom the simulation model for the test program in response to completingsimulation of the performance of the processor; predict a predictederror of the simulation model using the error model with the testsimulation statistics as a feature vector in response to training theerror model and in response to completing the simulation of theperformance of the processor for the test program; and adjust the testperformance statistic based on the predicted error.
 23. The one or morecomputer-readable storage media of claim 21, further comprising aplurality of instructions that in response to being executed cause thecomputing device to: simulate performance of the processor for a timeinterval of a test program with the simulation model to determine a testperformance statistic; capture test simulation statistics from thesimulation model for the time interval of the test program in responseto simulating the performance of the processor; predict a predictederror of the simulation model using the error model with the testsimulation statistics as a feature vector in response to capturing thetest simulation statistics and training the error model; and adapt thesimulation model based on the predicted error.
 24. The one or morecomputer-readable storage media of claim 19, wherein: to simulate theperformance of the processor for the training program comprisessimulating performance of the processor for a time interval of thetraining program; to capture the training simulation statisticscomprises capturing the training simulation statistics from thesimulation model for the time interval; to collect the ground truthperformance statistic comprises collecting the ground truth performancestatistic for the time interval of the training program; and to trainthe error model comprises training the error model in response tosimulating the performance of the processor for the time interval. 25.The one or more computer-readable storage media of claim 24, furthercomprising a plurality of instructions that in response to beingexecuted cause the computing device to: simulate performance of theprocessor for a time interval of a test program with the simulationmodel to determine a test performance statistic; capture test simulationstatistics from the simulation model for the time interval of the testprogram in response to simulating the performance of the processor;predict a predicted error of the simulation model using the error modelwith the test simulation statistics as a feature vector in response tocapturing the test simulation statistics; and adapt the simulation modelbased on the predicted error.